Data Extraction Fundamentals
Intro
Quiz: Action Time
Assessing the Quality of Data Pt. 1
Assessing the Quality of Data Pt. 2
Tabular Formats
CSV Format
You can download the datafiles from the Supporting Materials link on this page or the Course Materials page.
Pick the data format supported by your spreadsheet application, download the file, open it in the spreadsheet, then export it as “csv” file.
Compare the size of the spreadsheet file with the size of the csv file!
Quiz: Parsing CSV Files
code:
solution:
You can check the data in the dropdown in the top-left corner of the quiz starter code or download the datafile beatles-diskography.csv in the Supporting Materials below.
Python string method strip()
will come in handy to get rid of the extra whitespace (that includes newline character at the end of line)
Quiz: Problematic Line
Using CSV Module
csv.DictReader()
default denote the first row as the field labelsline
is dict
data type
code:
You can read more about the python csv module at the link below:
http://docs.python.org/2/library/csv.html
Intro to XLRD
You might find this video a lot more exciting if you try to run this code locally along the video! You can download the datafile from Course Materials. You can also install the xlrd library locally on your computer via python pip and the following command:
pip install xlrd
The example code:
Quiz: Reading Excel Files
You can download the “2013_ERCOT_Hourly_Load_Data.xls” datafile from the Supporting Materials section on this page or from this Course Materials page. Note that the code expects the data to be contained in an archive named “2013_ERCOT_Hourly_Load_Data.xls.zip”, so you will need to change the name of the downloaded archive or modify the code to run the code on your local computer.
You can also install the xlrd library locally on your computer via python pip and the following command:pip install xlrd
code:
solution:
Intro to JSON
Data Modeling in JSON
JSON Resources
Extra Info
If you’re unfamiliar with JSON, or would just like a refresher, W3Schools has a great tutorial on the subject.
You can also check out
You can find information about Python’s json
module on this page of the Python documentation. Note that JSON arrays are interpreted as lists and JSON objects as dictionaries, so you can use the standard Python approaches to inspect JSON data. You’ll get some practice exploring some data of this type in the next quiz.
Quiz: JSON Playground
You can check the data in the dropdown in the top-left corner of the quiz starter code.
‘Run locally’ means that you have to download or copy the file contents to your local machine, modify it and run.
To be able to do that you need to have Python installed, as well as the requests module. Please see Requests installation documentation. If you have “pip, you can install Requests by running the following command:
pip install requests
To learn more about the requests module, see the documentation here.